Changes to support pass@k evaluation on the HumanEval dataset #1180

shubhra · 2023-08-11T14:56:20Z

Example:

numactl -C0-15 python deepsparse/src/deepsparse/transformers/eval_downstream.py \
        <model_path>\
        --num-cores 16 \
        --dataset openai_humaneval \
        --humaneval-method pass_at_k \
        --engine deepsparse \
        --start 0 \
        --max-samples 2

This will create a subset of the HumanEval dataset starting at index 0 (start) and pick 2 samples (max-samples) to run the evaluation on.
If benchmark-humaneval argument is supplied, the evaluation will run on a pre-selected smaller subset of the dataset that contains 11 samples and will ignore start and max-samples.
Set humaneval-method to perplexity to evaluate perplexity instead of pass@k.
Add --n-solutions <n> to specify the number of solutions required per task . Default is 1.

Note: Remove numactl -C0-15 if you don't need to specify which cores to run on.

Changes to support pass at k evaluation on the HumanEval dataset

56b022d

shubhra marked this pull request as draft August 11, 2023 14:56

shubhra changed the title ~~Changes to support pass at k evaluation on the HumanEval dataset~~ Changes to support pass@k evaluation on the HumanEval dataset Aug 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes to support pass@k evaluation on the HumanEval dataset #1180

Changes to support pass@k evaluation on the HumanEval dataset #1180

shubhra commented Aug 11, 2023

Changes to support pass@k evaluation on the HumanEval dataset #1180

Are you sure you want to change the base?

Changes to support pass@k evaluation on the HumanEval dataset #1180

Conversation

shubhra commented Aug 11, 2023